Client Report - What’s in a Name?

Unit 1 Task 2

Author

William Hardy

Show the code
import pandas as pd
import numpy as np
from lets_plot import *

LetsPlot.setup_html(isolated_frame=True)
Show the code
# Learn morea about Code Cells: https://quarto.org/docs/reference/cells/cells-jupyter.html

# Include and execute your code here
df = pd.read_csv("https://github.com/byuidatascience/data4names/raw/master/data-raw/names_year/names_year.csv")

QUESTION 1

How does your name at your birth year compare to its use historically? Your must provide a chart. The years labels on your charts should not include a comma.

the chart shows that I was not quite born in a year where my name is popular. that the names popularity jumped after the time of WW2 with it’s highest peak in the year of 1951 with 52,877 williams.

Show the code
# Q1
from lets_plot import *

# filter for William
william = df.query('name == "William"')

# set your birth year
my_birth_year = 2000

# prepare ticks and labels as lists
year_ticks  = list(range(1880, 2025, 20))
year_labels = [str(y) for y in year_ticks]

# build the plot
plot = (
    ggplot(william, aes(x='year', y='Total')) +
    geom_line(size=1) +
    geom_vline(
        xintercept = my_birth_year,
        linetype   = 'dashed',
        size       = 1,
        color      = 'red'
    ) +
    scale_x_continuous(
        breaks = year_ticks,    # list of ints
        labels = year_labels    # list of matching strings
    ) +
    labs(
        title   = "Babies named William in the U.S., 1880–2023",
        x       = "Year",
        y       = "Count of Williams",
        caption = "The red line indicates the year I was born."
    )
)

# display it
plot

QUESTION 2

If you talked to someone named Brittany on the phone, what is your guess of his or her age? What ages would you not guess? Try to justify your answer with whatever statistics knowledge you have. You must provide a chart. The years labels on your charts should not include a comma.

I would say that if someone called brittany reached out to me then I would say that that could easily be in their 30’s. the reason why is becasue the max amount of people

Show the code
# Q2
from lets_plot import *

# --- filter for Brittany ---
brittany = df.query('name == "Brittany"')

# --- determine peak year & count ---
peak_row    = brittany.loc[brittany['Total'].idxmax()]
peak_year   = int(peak_row['year'])
peak_count  = int(peak_row['Total'])

# --- prepare ticks & labels ---
year_ticks  = list(range(1880, 2025, 20))
year_labels = [str(y) for y in year_ticks]

# --- build the plot ---
plot_q2 = (
    ggplot(brittany, aes(x='year', y='Total')) +
    geom_line(size=1) +
    geom_vline(
        xintercept = peak_year,
        linetype   = 'dashed',
        size       = 1,
        color      = 'blue'
    ) +
    scale_x_continuous(
        breaks = year_ticks,
        labels = year_labels
    ) +
    labs(
        title   = "Babies Named Brittany in the U.S., 1880–2023",
        subtitle= f"Peak in {peak_year} with {peak_count:,} Brittanies",
        x       = "Year",
        y       = "Count of Brittanies",
        caption = "Data: data4names | Blue line = peak popularity year"
    )
)

# --- display it ---
plot_q2